GNUsmail: Open Framework for On-line Email Classification
نویسندگان
چکیده
Real-time classification of massive email data is a challenging task that presents its own particular difficulties. Since email data presents an important temporal component, several problems arise: emails arrive continuously, and the criteria used to classify those emails can change, so the learning algorithms have to be able to deal with concept drift. Our problem is more general than spam detection, which has received much more attention in the literature. In this paper we present GNUsmail, an open-source extensible framework for email classification, which structure supports incremental and on-line learning. This framework enables the incorporation of algorithms developed by other researchers, such as those included in WEKA and MOA. We evaluate this framework, characterized by two overlapping phases (pre-processing and learning), using the ENRON dataset, and we compare the results achieved by WEKA and MOA algorithms.
منابع مشابه
Using GNUsmail to Compare Data Stream Mining Methods for On-line Email Classification
Real-time classification of emails is a challenging task because of its online nature, and also because email streams are subject to concept drift. Identifying email spam, where only two different labels or classes are defined (spam or not spam), has received great attention in the literature. We are nevertheless interested in a more specific classification where multiple folders exist, which i...
متن کاملPalarimetric Synthetic Aperture Radar Image Classification using Bag of Visual Words Algorithm
Land cover is defined as the physical material of the surface of the earth, including different vegetation covers, bare soil, water surface, various urban areas, etc. Land cover and its changes are very important and influential on the Earth and life of living organisms, especially human beings. Land cover change monitoring is important for protecting the ecosystem, forests, farmland, open spac...
متن کاملStructural Macroeconomic Capacity to a Reaction in Economic Policy Shocks : The Case of Iran
        The aim of this paper is to simulate the effects of some macroeconomic policy tools on production and inflation of Iran by the current worldwide financial and real crisis. The theoretical framework of the analysis is based on the so-called âIMF/World Bank Integrated Modelâ which is the synthesis (a merger) of the basic monetary approach of the Balance of Payments used at t...
متن کاملنقش جغرافیا در شکلگیری انواع حیاط در خانههای سنتی ایران
Extended Abstract Introduction The formation of built spaces in composition to open spaces is one of the important subjects in designing architectural spaces. Different factors were important in formation of open spaces in traditional architecture. The kind of function of a building was one of these factors, because the quality of designing courtyards and its elements were dependent to the ...
متن کاملAGN Zoo and Classications of Active Galaxies
We review the variety of Active Galactic Nuclei (AGN) classes (so-calledAGN zoo") and classification schemes of galaxies by activity types based on their optical emission-line spectrum, as well as other parameters and other than optical wave-length ranges. A historical overview of discoveries of various types of active galaxies is given, including Seyfert galaxies, radio galaxies, QSOs, BL Lace...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010